Skip to content

fix(otel-collector): default k8snode resource detection on on-prem K8s [CDS-2925]#443

Merged
natnayr merged 2 commits into
mainfrom
fix/cds-2925-on-prem-k8snode-detector
Apr 22, 2026
Merged

fix(otel-collector): default k8snode resource detection on on-prem K8s [CDS-2925]#443
natnayr merged 2 commits into
mainfrom
fix/cds-2925-on-prem-k8snode-detector

Conversation

@natnayr

@natnayr natnayr commented Apr 22, 2026

Copy link
Copy Markdown

Summary

Restores Coralogix Infra Catalog support on on-prem Kubernetes (e.g. EKSA) when users set provider: on-prem. Without this, the resource catalog cannot resolve pod-to-node relationships because the chart was emitting no k8s.node.* resource attributes for on-prem deployments.

Per Slack thread CDS-2925 (EKSA customer report).

Background

PR #411 (v0.129.2) made the resource detection processors provider-aware. For provider: on-prem it set the cloud detector list to empty in two places, which:

  • Skipped the entire resourcedetection/resource_catalog processor (gated by if gt (len $catalogDetectors) 0), so the logs/resource_catalog pipeline emitted nothing useful for the Infra Catalog.
  • Left resourcedetection/env with only [system, env], so k8s.node.* attributes never landed on logs/metrics/traces either.

K8S_NODE_NAME is already injected via the Downward API in templates/_pod.tpl for non-ECS / non-standalone / non-macOS distributions, so the k8snode detector works out of the box without further plumbing.

Changes

charts/opentelemetry-collector/templates/_config.tpl:

  • opentelemetry-collector.kubernetesResourcesConfig: for provider: on-prem on Kubernetes, default $catalogDetectors to [k8snode] (was []). Add a k8snode: { node_from_env_var: K8S_NODE_NAME } block under resourcedetection/resource_catalog when that detector is in use.
  • opentelemetry-collector.resourceDetectionConfig: introduce $defaultEnvDetectors[system, env] by default, [env, k8snode, system] for on-prem K8s. Add a sibling k8snode: config block under resourcedetection/env when that detector is in the list.

charts/opentelemetry-collector/values.yaml:

  • Drop the hardcoded default presets.resourceDetection.detectors.env: [system, env] so the template default applies (including the on-prem K8s default). Documented behavior and the # env: [] opt-out in comments.

charts/opentelemetry-collector/examples/on-prem-k8s/:

  • New example covering distribution: "" + provider: on-prem + kubernetesResources + kubernetesAttributes + resourceDetection. Locks in the new defaults via rendered fixtures.

Chart.yaml / CHANGELOG.md:

  • Bump chart from 0.130.12 -> 0.130.13 with a CHANGELOG entry noting the fix and the v0.129.2 regression.

All other touched files are re-rendered fixtures from make generate-examples (chart version label bumps; the on-prem-k8s example is the only behavior change).

Backward compatibility

  • Users who set presets.resourceDetection.detectors.env or .cloud explicitly retain full control. Defaults only change when those values are unset.
  • For non-K8s on-prem (distribution: standalone / ecs / macos) behavior is unchanged.

Test plan

  • helm lint charts/opentelemetry-collector — passes
  • make check-examples CHARTS=opentelemetry-collector — all 60 examples diff-clean against make generate-examples output
  • ./charts/opentelemetry-collector/validate-configs.shon-prem-k8s validates; only failures are pre-existing/environmental (standalone-systemd missing receiver in bundled binary, standalone-windows macOS sandbox hostmetrics issue)
  • On-prem-k8s rendered output verified to contain:
    • processors.resourcedetection/resource_catalog.detectors == [k8snode] with k8snode.node_from_env_var: K8S_NODE_NAME
    • processors.resourcedetection/env.detectors == [env, k8snode, system] with sibling k8snode: block
    • logs/resource_catalog pipeline includes resourcedetection/resource_catalog
    • K8S_NODE_NAME env var injected on the deployment container

Out of scope

  • No change in telemetry-shippers/otel-integration. The bot will pick up 0.130.13 on the next scheduled run via bump-otel-collector-version.sh.

Made with Cursor

…s [CDS-2925]

Reverts the regression introduced in v0.129.2 (PR #411) where setting
`provider: on-prem` on a Kubernetes distribution skipped the
`resourcedetection/resource_catalog` processor entirely and limited
`resourcedetection/env` to `[system, env]`. Without `k8s.node.*` resource
attributes, the Coralogix Infra Catalog could not establish pod-to-node
relationships on EKSA / self-managed Kubernetes clusters.

For `provider: on-prem` on a Kubernetes distribution, the chart now
defaults to:
- `resourcedetection/resource_catalog`: detectors `[k8snode]` with
  `node_from_env_var: K8S_NODE_NAME`
- `resourcedetection/env`: detectors `[env, k8snode, system]` with the
  matching `k8snode` block

`K8S_NODE_NAME` is already injected via the Downward API in
`_pod.tpl` for non-ECS / non-standalone / non-macOS distributions, so no
additional plumbing is required. Users who set
`presets.resourceDetection.detectors.env` or `.cloud` explicitly retain
full control.

Also adds an `examples/on-prem-k8s` fixture covering this configuration
and re-renders all examples for chart bump 0.130.12 -> 0.130.13.

Made-with: Cursor
@natnayr natnayr requested review from a team, nicolastakashi and oded-dd as code owners April 22, 2026 06:45
@natnayr

natnayr commented Apr 22, 2026

Copy link
Copy Markdown
Author

@codex review

@chatgpt-codex-connector

Copy link
Copy Markdown

To use Codex here, create a Codex account and connect to github.

…-2925]

Address review feedback on PR #443.

`presets.resourceDetection.detectors.env: []` was previously coerced back
to the on-prem default by `| default $defaultEnvDetectors`, because Helm
treats an empty list as falsy. Use `kindIs "invalid"` to distinguish
"unset/null" from "explicitly empty", so users can opt out of
resourcedetection/env entirely with `env: []`, and so explicit lists like
`env: [system, env]` are respected verbatim on on-prem Kubernetes.

values.yaml now declares `env:` (null) with an honest comment explaining
the override semantics, and the JSON schema accepts ["array", "null"].

Verified with `helm template`:
- on-prem K8s, env unset         -> [env, k8snode, system] + k8snode block
- on-prem K8s, env: []           -> detectors: [], no k8snode block
- on-prem K8s, env: [system,env] -> [system, env], no k8snode block
- AWS EKS, env unset             -> [system, env], no k8snode block

No rendered example fixtures change (all examples leave `env` unset).

Made-with: Cursor
@natnayr natnayr merged commit 95911f9 into main Apr 22, 2026
7 checks passed
@natnayr natnayr deleted the fix/cds-2925-on-prem-k8snode-detector branch April 22, 2026 11:30
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants